Silence energy normalization for robust speech recognition in additive noise environment
نویسندگان
چکیده
The energy parameter has been widely used as an extension to the basic features of mel-frequency cepstral coefficients (MFCCs) to improve the recognition accuracy in speech recognition. In this paper, a simple and effective approach for energy normalization for silence (non-speech) portions in an utterance is proposed. This approach, named as silence energy normalization (SEN), uses the high-pass filtered log-energy as the feature for speech/non-speech classification, and then the log-energy of non-speech frames is set to be a small constant while that of speech frames is kept unchanged. In the experiments conducted on AURORA2 database, we showed that SEN provides an averaged word error rate reduction of 34.9% and 44.6% for Test Sets A and B, respectively, when compared with the baseline processing. It was also shown that SEN outperforms similar approaches like energy subtraction (ES) and feature vector selection (FVS). Finally, we showed that SEN can be integrated with cepstral mean and variance normalization (CMVN), to achieve further improved recognition performance.
منابع مشابه
Silence feature normalization for robust speech recognition in additive noise environments
In this paper, we propose a simple yet very effective feature compensation scheme for two energy-related features, the logarithmic energy (logE) and the zeroth cepstral coefficient (c0), in order to improve their noise robustness. This compensation scheme, named silence feature normalization (SFN), uses the high-pass filtered features as the indicator for speech/non-speech classification, and t...
متن کاملRobust Speech Detection using SEM and SFN
Speech recognition, the problem of performance degradation is the difference between the model training and recognition environments. Silence features normalized using the method as a way to reduce the inconsistency of such an environment. Silence features normalized way of existing in the low signal-to-noise ratio. Increase the energy level of the silence interval for speech and non-speech cla...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کامل端點偵測技術在強健語音參數擷取之研究 (Study of the Voice Activity Detection Techniques for Robust Speech Feature Extraction) [In Chinese]
The performance of a speech recognition system is often degraded due to the mismatch between the environments of development and application. One of the major sources that give rises to this mismatch is additive noise. The approaches for handling the problem of additive noise can be divided into three classes: speech enhancement, robust speech feature extraction, and compensation of speech mode...
متن کامل強健性語音辨識中能量相關特徵之改良式正規化技術的研究 (Study of the Improved Normalization Techniques of Energy-Related Features for Robust Speech Recognition) [In Chinese]
The rapid development of speech processing techniques has made themselves successfully applied in more and more applications, such as automatic dialing, voice-based information retrieval, and identity authentication. However, some unexpected variations in speech signals deteriorate the performance of a speech processing system, and thus relatively limit its application range. Among these variat...
متن کامل